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ABSTRACT 


We  consider  ^he  problem  of  estimating  a covariance  matrix  in  the 
standard  multivariate  normal  situation.  9«?M"oss  function  is  one 


obtained  naturally  from  the  problem  of  estimating  several  normal  mean 


vectors  in  an  empirical  Bayes  situation.  Estimators  which  dominate  any 


constant  multiple  of  the  sample  covariance  matrix  are  presented.  These 


estimators  work  by  shrinking  the  sample  eigenvalues  toward  a central 


value,  in  much  the  same 


a mean 


vector  shrinks  the  maximum  likelihood  estimators  toward  a common  value 
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MULTIVARIATE  EMPIRICAL  BAYES  AND  ESTIMATION  OF  COVARIANCE  MATRICES 


1.  INTRODUCTION  AND  SUMMARY 

The  problem  of  finding  multivariate  empirical  Bayes  estimators 
reduces  under  certain  circumstances  [1]  to  one  of  estimating  the 
inverse  of  an  unknown  covariance  matrix  X from  an  observed  p*p  co- 
variance  matrix  _S  having  the  Wishart  distribution  with  k degrees  of 
freedom  and  mean  kX 

(1.1)  .S  ~ Wp(2,  k) 


using  the  loss  function 


(1.2) 


ktr(E  *) 


We  assume  throughout  that  X exists,  and  that  k > p + 1.  The  usual 
estimator  of  X ^ is  the  best  multiple  of  _S  ^ , which  for  this  loss 
function  is 


(1.3) 


»-l  -1 

X = (k-p-l)S  . 


The  estimator  (1,3)  is  the  best  unbiased  estimator  of  X and  is  minimax 
with  constant  risk  (p+l)/k.  We  used  (1.3)  in  [1]  to  derive  a multi- 
variate empirical  Bayes  estimator,  a generalization  of  the  James-Stein 
estimator  [3],  for  cases  p > 2. 


•k 
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In  the  first  main  theorem  we  show  that  a uniformly  better 
estimator  than  (1.3)  if  p > 2 is 


(1.4) 


- (k-p-l)S-1  + ■ 


“-I  . 


ote  that  increases  (1.3)  by  an  amount  proportional  to  the  estimator 


(1.5) 


y-1  = -B*~2  T 

“ trTsT  i 


which  is  the  best  unbiased  estimator  of  E when  X is  known  to  be  pro- 
portional to  the  identity  matrix.  The  risk  functions  of  these  estimators 
and  their  mixtures. 


(1.6) 


E-1  = (l-oOE'1  + ctET1  0 < a < 1, 
—a  —0  ~1  — — 


which  are  also  of  interest,  are  considered  in  Secs.  3,  5. 

We  show  in  the  other  main  theorem,  Sec.  4,  that  the  empirical  Bayes 
estimators  derived  from  (1.6)  are  minimax,  all  dominating  the  maximum 
likelihood  estimator  X of  a pxk  matrix  of  means  for  fixed  _0.  The 
case  cv  = 1 corresponds  to  the  James-Stein  estimator  applied  to  all  pk 
values  9 simultaneously  while  the  new  estimator  with  a = 0 uniformly 

ij 

improves  the  multivariate  empirical  Bayes  estimator  of  [1]. 
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2.  THE  RELATIONSHIP  BETWEEN  MULTIVARIATE  EMPIRICAL  BAYES  ESTIMATION 
AND  ESTIMATING  THE  INVERSE  OF _A  COVARIANCE  MATRIX  ' 

Given  k independent  p-dimensional  normal  column  vectors, 

X.  , X,  , with  X.  having  conditional  mean  vector  0.  and  the  identity 

— 1 ~1 

covariance  matrix  _I, 

(2.1)  X.|Q.'d  N (0.,  I)  i = 1,  ....  k 

— l'—i  p -l  ~ 

and  given  that  the  unknown  parameter  vectors  are  an  independent 
sample  from  a multivariate  normal  distribution  with  mean  zero  and 
covariance  matrix  A 

(2.2)  0i1~d  N (0,  A)  i = 1,  ....  k 

tnen  the  multivariate  Bayes  estimator  of  0^  with  respect  to  squared 
error  loss  is 

(2.3)  0*  = a - L_1)X.  1 = 1,  ....  k 

where  we  have  defined 

(2.4)  1.  - J.  + A . 

In  tiie  empirical  Bayes  situation  A and  l.  are  unknown, 

-1 

so  the  Bayes  estimator  (1.3)  cannot  be  computed.  The  matrix  ma> 
be  estimated,  however,  since  (2.1)  and  (2.2)  give  the  marginal  distri- 
bution 

(2.5)  X.  - Np(0,  I) 


■A  :t 


■Lift 


* ■ N 
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to  X^.  A complete  sufficient  statistic  for  estimating  E is  S = X 
having  the  Wishart  distribution  (1.1),  with  X being  the  pxk  matrix 
Qii » • • • > • 

If  we  estimate  the  pxk  matrix^)  = (8^,  _0^)  with  normalized 

squared  error  loss  function 

(2.6)  t(B,  0)  = E*  _E?  - (0.  . - 0.  ,)2 

~ - pk  i=l  J=1  ij  ij 

by  a rule  similar  to  (2.3),  of  the  form 

(2.7)  1 s Ci  “ If1)* 

0-1 

with  _E  depending  only  on  S,  then  the  risk  R of  (2.7),  which  is 
computed  by  averaging  (2.6)  over  both  distributions  (2.1)  and  (2.2), 
may  be  written 

* 0 * -1  '•-l 

(2.8)  R = R + (R  - R )EL(E  , E ; £) . 

Here  R^  = 1 is  the  risk  of  the  maximum  likelihood  estimator  _0  = _X  with 

~-l  * -i 

E = j),  R = 1 - tr(E  )/p  is  the  risk  of  the  Bayes  estimator  (2.3) 

~_1  _i 

with  E = E known,  and  L(E  , E ; S)  is  the  loss  function  (1.2). 

The  proof  of  (2.8)  follows  easily  by  averaging  l first  over  its  con- 
ditional distribution 

(2.9)  9 . | X - N ((I  - E_1)X.,  I - Zf1), 
as  shown  in  [1,  Lemma  1]. 

The  problem  of  evaluating  multivariate  empirical  Bayes  estimators 
in  this  situation  reduces  to  evaluating  estimators  of  the  inverse  of  an 
unknown  covariance  martix  I because  R°  and  R*  are  unaffected  by  the 
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particular  estimator  i-  under  consideration  and  because  the  risk 

EL(L  , £ » _S)  » called  the  "relative  savings  loss"  in  [1],  only  involves 

an  expectation  over  S having  the  Wishart  distribution  (1.1). 
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3.  AN  ESTIMATOR  OF  THE  COVARIANCE  MATRIX  £ WHICH  DOMINATES  ANY 
MULTIPLE  OF  S 

Assume  Lhe  distribution  (1.1)  and  the  loss  function  (1.2).  We 

«-l  -1 

consider  estimators  £ of  the  form  (1.6).  Denote  u s tr(£  )/p  and  let 


(3.1) 


9 = 


k E f-k-~2-  . 
u;  trO>) 


We  will  show  in  Sec.  5 that  0 < cp  < 1 for  all  £ and  also  that 


(3.2) 


tr(£_1S) 
tr(S)  * 


In  the  special  case  £ = ol,  the  maximum  value  cp  = 1 is  attained.  Denote 
c = (p2+p-2)/ (pk-2)  so  0 < c < 1 and  0 < c < 1 if  both  p > 1 and  k > p + 1. 


Theorem  1. 


The  risk  of 


is 


(3.3) 


R 

O' 


In  particular, 
(3.4) 


= EL(£_1 , £_1;  S) 


p+1  k-p-1  2 pk-2 , , 

- ^ a - ^-(c+a-cof) 


7--1 


£^  is  minimax,  having  risk 


£±1 


>k-2  2 

7i T c 01 


2 

cp. 


which  is  uniformly  smaller  than  the  risk  (p+l)/k  of  the  best  multiple  of 
S_1,  (k-p-l)S-1. 

o/  /v 

Proof . We  compute  the  risk  of 
(3.5)  if1  = aS_1  + bl/tr (S) 

from  (1.2)  as 
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^ E tr(aS-1  + bI/tr(S)  - jfVs 


a ^ ,„-l,  , 2a b „ 1 

— T-  Etr (S  ) + i L - — 7— r 

prcoi  — kiu  tr(S) 


2a 

k 


(3.6) 


jdt 

1 

2b 

tr  (2. 

17 

S) 

1 

pkuj 

tr(S) 

pkm 

tr 

(S)  + 

pkiu 

2 

a 

2ab 

b2 

;-p-l) 

T k(pk-2)  * 

■ r + 

pk (pk- 

^-9 

E tr(lf2S) 


2b 


where  we  liave  used  (3.1),  (3.2)  and  E(k-p-l)S  = ^ . The  minimizing 

value  of  b is  obtained  by  differentiating  (3.6)  and  is  b = pk-2-ap 
which  is  independent  of  the  unknown  parameters.  Inserting  b into  (3.6) 
and  simplifying  gives 


(3.7) 


_ E±1  , (k-p-l-a) ‘ 
k k(k-p-l) 


(pk-2-ap) 

pk(pk-2) 


Reparameterizing  with  a = (k-p-1) (l-o)  and  substituting  this  value  into 
(3.7)  yields  (3.3).  Assertion  (3.4)  follows  by  setting  a = 0 in  (3.3). 
The  proof  is  complete. 

Discussion.  If  cp  is  known,  R is  minimized  at 


(3.8) 


O'  = ccp/ [ l-cpfccp] 


which  increases  monotonically  from  0 to  1 as  cp  increases  from  0 to  1 . 
Then  the  risk  is 


(3.9) 


R * = o*  + (l-o*)  ^ 
o pk  k 


The  case  cp  = 1 (Z  proportional  to  the  identity)  u-  = 1 yields  the  rule 
(1.5)  as  an  estimate.  More  generally,  if  a prior  distribution  on  L is 
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given,  then  the  rule  of  the  form  (3.5)  that  minimizes  the  average  risk 
takes  the  form  (1.6)  with 

(3.10)  a = cEcp/ [ 1-EcfH-cEcp] , 

which  depends  only  on  the  a priori  mean  Ecp  of  cp.  R **  then  is  given  by 

* -k-k 

(3.9)  with  o replaced  by  ot  . These  facts  are  proven  by  averaging 
(3.3)  over  the  prior  distribution,  and  then  by  differentiating  (3.3), 
perhaps  most  easily  in  the  form 

(3.11)  R = [ (l-c)a^-  (c+a-coO^Ecp]  . 

a k pk 

The  minimal  complete  subclass  of  the  class  of  all  rules  of  the 
form  (3.5)  with  < a,  b < “>  is  the  class  of  rules  Z ^ (1.6)  with 

~Ct 

0 < a < 1.  Thus  a = (k-p-1)  (1-or)  and  b = (pk-2)  (c+o-cor)  from  the  proof 

— 1 

of  Theorem  1.  To  show  that  the  rules  Z with  0 < a < 1 are  a complete 

~Q!  — — 

class,  we  note  for  any  fixed  cp  that  R is  strictly  convex  with  minimizer 

a 

* * 

a satisfying  0 < a < 1.  Therefore,  the  risk  of  any  rule  with  a i [0,  1] 
may  be  decreased  for  all  cp  by  using  the  nearest  value  in  [0,  1]  to  or. 

■k 

These  rules  are  minimal  complete  since  the  minimizing  a (3.8)  is  an 
invertible  function  of  cp. 

There  are  many  minimax  estimators  (rules  with  risk  not  exceeding 

A_1 

(p+l)/k)  in  the  class  (3.5).  The  best  such  estimator  is  Z because 

~0 

the  minimax  estimators  must  have  a = k-p-1  to  perform  well  at  cp  = 0, 

2 

and  then  b = p +p-2  is  the  best  choice  for  b. 


r-r- 7" 


4.  USING  THE  COVARIANCE  ESTIMATORS  IN  A SIMULTANEOUS  ESTIMATION  PROBLEM 


In  the  context  of  Sec.  2,  we  are  suggesting  estimators  of  the  p*k 
matrix  _0  of  the  form 

(4.1)  e = (i  - 2"1)x 

' ' —a  ~ —a  — 

with  given  by  (1.6).  With  respect  to  the  squared  error  loss  function 
(2.6),  the  risk  of  _0  is  computed  by  averaging  over  both _X  and  £ distri- 
buted as  (2.1),  (2.2),  and  as  a function  of  4.  is 

(4.2)  E-t(0 , 0 ) = 1 - uu  + uuR 

' —a  ot 

which  derives  from  (2.8)  and  (3.3)  with  uu  s tr(T  ^)/p.  Since  we  may 
also  write 

(4.3)  uu  = (k-p-l)Etr(S  ^)/p 


and  use  (3.1)  to  provide  an  expression  for  uxp,  (4.3)  may  be  written  as 


(4.4) 


E^(0 


, jL)  = 1 ~ (l-Q'2)  Etr  (S_1) 


pk 


(Pk.  J)._  (c^_cq,)2e  - 1 


pk 


tr (S)  ■ 


Both  sides  of  (4.4)  involve  first  an  expectation  E over  the  distribution 
(2.1)  of  X given _0,  this  expectation  being  a function  of  A s _00'  only, 
followed  by  an  expectation  E^  over  the  distribution  (2.2)  of  _G  for  fixed 
A.  Since  the  family  of  distributions  of  A is  complete  for  A,  (4.4)  holds 
even  when  the  E^  expectation  is  removed,  proving  the  following  theorem. 
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(4.5) 


Theorem  2.  As  a function  of  0 the  rule  0 of  (4.1)  has  risk 

— —O' 

2 

EvtCe,  5 ) = 1 - Ck~p71?  (l-a2 *)E  tr(S_1) 


O' 


pk 


- <c-ta-c*)%  cTTsT  • 

Each  estimator  0 < a < 1 is  therefore  a minimax  esVmator  of  6 for  the 
squared  error  loss  function  (2.6)  and  has  risk  (4.5)  uniformly  lower  than 
the  unit  risk  of  the  maximum  likelihood  estimator  6 = X.  Expression  (4.5) 
provides  an  unbiased  estimate  of  the  risk  of  6 . 

~Q' 

The  James-Stein  estimator  is  the  rule  a = 1 with  risk 


(4.6) 


l _ (pk-2)4 


pk  0 tr  (S) 
The  particular  rule  with  or  = 0> 


(4.7) 


0Q  h (I  - (k-p-l)s' 


■1  _ p +p-2 


tr(S) 


TT  I)X, 


is  the  best  in  the  class  _0^  as  r = tr (86 1 ) -»  “>  and  improves  the  risk 
l-(k-p-l)  Egtr(S  ^)/pk  of  _0  = (I  - (k-p-l)_S  '*')_X  by  the  amount 


(4.8) 


2 2 
(p  +p-2r 

pk 


_0  tr (S) 


2 2 

The  improvement  (4.8)  is  largest  at  _0  = 0 where  it  is  (p  +p-2)  /pk(pk-2) . 

Bounds  on  the  last  term  of  (4.5),  (4.6),  and  (4.8)  may  be  computed  for 
any  0 from  the  fact  that 


(4.9) 


< E 


pk-2 


l4r/(pk-2)  - Q tr(S)  - 1-K/pk 


f ' • ■->- 


^sr.-rr 


Only  assertion  (4.9)  needs  proof.  Since  tr(S)  has  a non- 

2' 

central  chi-square  distribution  with  mean  pk+T , tr(S)  — it 

can  be  written  as  a Poisson  mixture  of  central  chi-squares  as  in  [5], 
2 

say  tr(S)  — Xpk+2j»  J ~ Poisson  with  mean  t/2.  Letting  Et  indicate 
expectation  with  respect  to  the  Poisson  distribution. 


(4.10) 


= E 


'_G  tr  (S)  t pk+2J-2 


and  the  left-hand  side  of  (4.9)  is  obtained  from  Jensen's  inequality. 
To  obtain  the  right-hand  inequality,  write  E l/(pk+2J-2)  as 


_L_  fl  _ g e~T/2(T/2)j  2j , 

pk-2  1 j=Q  j!  Pk+2j-2J 

and  notice  that  this  can  also  be  expressed  as  [1  - T E^  1/ (pk+2J) ] / (pk-2) 
Jensen's  inequality  E l/(pk+2J)  > l/(pk+T)  gives  the  result. 
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5.  RISK  FUNCTIONS  AND  THE  FUNCTION  cp 

We  will  now  give  a more  explicit  evaluation  of  the  function  9 which 
appears  in  the  risk  formula  (3.3).  Let  ....  Wp  be  independent 
random  variables  and  l ^ = W^/DW^ . Let  0^,  . ..,  o be  the  eigenvalues  of 
2,  uj  = tr(£  ^)/p  = L(l/Oj)/p  and  define 

(5.1)  tf5  - ECl?  . o.U.)'1  . 

Y u>  j“l  j j 

The  value  (5.1)  agrees  with  (3.2)  because  orthogonal  invariance  permits 

the  assumption  E diagonal  with  elements  o^,  ....  and  then  (3.2)  with 

Wi  = Sii^°i  reduces  to  ~ E(EWi/Ib1W1) , being  (5.1).  Because  IW^  is 

independent  of  (IL  , ....  U ), 
i P 


(*iUi> 


pk-2 

ZW. 

J 


(5.2) 


= A E Pk~2  = I F Pk-2 

o>  La.W.  ui  ^ tr (S) 
1 1 — 


establishing  the  equivalence  of  (5.1)  and  (3.1).  Note  0 < cp  < 1 since 

1/So  U.  < 2 U ,/a.  and  EZ  U./cj.  = - Z 1/a.  = uu. 
ii—  ii  lip  1 

Define 


(5.3)  p = p/uutr(Z) 

as  the  squared  cosine  of  the  angle  between  Z2  and  Z~2 , so  0 < p < 1. 
Jensen's  inequality  applied  to  (5.1)  shows  cp  > p.  We  have  bounds 

(5.4)  p < cp  < min  (1,  p) 


since  letting  tt^  = a^/Za^ 


in  (5.1)  gives 
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(5.5) 


_j__  . _i i — < 2 

^iul  ai  Enlui  ~ &i  ui  ' 


Taking  expectations  of  (5.5)  and  using  E 1/lL  = (kp-2)/(k-2)  for  all 
i proves  (5.4).  The  bounds  (5.4)  become  tight  as  k increases  and 
for  any  p 


(5.6) 


lim  cp^  = P • 
k-*c° 


The  index  henceforth  will  be  used  to  indicate  the  dependence  of  c p on 
k.  The  values  cp^  and  p are  unity  only  when  2 = orl,  i.e.,  only  when 
all  a are  equal,  and  the  lower  bound  of  (5.4)  is  the  better  approximation 
when  the  are  nearly  equal.  Dispersed  a ^ cause  cp^  and  p both  to  approach 
zero  with  the  upper  bound  of  (5.4)  being  attained  asymptotically  if  at 
least  one  o is  finite  and  one  o approaches  infinity. 

In  the  special  case  p = 2,  cp^  depends  on  2.  only  through  the  ratio 
X = °2^°i  the  lar8est  to  the  smallest  eigenvalue.  Then  values  of  cp^ 
are  generated  recursively  for  1 / 1 by 


(5.7) 


= 2 /a  / (A+l)  = /p,  cp2  = 2X  log(A)/ (\2-l) 


k-1 
^k  k-2 


4A 

(\-l)2 


a-Tk-2) 


k-1  p 
~ k-2  1-p 


(1“9k-2)  ’ k^3- 


Obviously  cp^  = 1 if  X = 1.  We  omit  the  proof  of  (5.7)  to  save  space. 

2 

The  limiting  value  of  cp^  as  k -»  00  is  p = 4A/(l+\)  . 

The  function  cp^  is  plotted  in  Figure  1 for  the  case  p = 2,  k = 6 
together  with  the  four  risks,  from  (3.6), 


■ t ■?. . 


for  o=0,  .25,  .50,  1.  Figure  1 illustrates  that  a = 0 is  best  if 
cp  = 0 and  a = 1 is  best  if  cp  = 1 as  confirmed  by  (3.8),  while  inter- 
mediate values  like  a = .25  and  a = .5  are  effective  compromises  if 
the  extremes  cp  = 0 or  cp  = 1 are  not  especially  likely.  It  is  tempting 
to  estimate  cp,  say  by  a function  C/[tr(S  ^)tr(S)],  C close  to 
p (pk-2) / (k-p-1) , and  to  use  this  to  determine  an  estimated  value  a 
from  (3.8).  In  the  situation  of  Figure  1,  for  example,  the  hope  would 
be  to  produce  a rule  with  risk  function  close  to  the  lower  envelope  of 
the  risk  functions  graphed.  Our  calculations  for  the  case  p = 2 show  that 
the  suggested  rule  works  fairly  well,  provided  cy  is  forced  to  be  less  than 
unity,  and  that  smaller  values  of  C could  be  better.  But  no  clear  guidelines 
for  the  use  of  such  "adaptive"  rules  are  available  at  this  time. 

The  improvement  of  the  rule  a = 0 over  the  best  multiple  of  S ^ 

is  measured  by  the  distance  between  the  curve  and  the  horizontal 

line  R = .5  in  Figure  1.  This  is  a 27  percent  improvement  in  risk  at 

A.  = 1;  larger  improvements  can  occur  in  cases  with  k large  and  p near  k. 

~-l  *-l 

For  any  p,  k,  has  lower  risk  than  L’^  provided  cp  < l/(l+c). 

This  holds  for  p = 2,  k = 6 provided  ^/k  > 1.90.  Note  that  Jk  is  the 
ratio  of  the  standard  deviations  of  the  major  and  the  minor  principal 
components  defined  by  the  two  rows  of  X. 
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6.  THE  RESTRICTION  £_1  < I 

We  know  £ ^ < I since  £ * I + A with  A nonnegative  definite,  but 

rw  rwi  rw  ^ 

the  estimators  of  (1.6)  do  not  obey  this  inequality.  This  undesirable 

*>_1 

feature  may  be  overcome  as  follows.  Diagonalize  = r'^AT  with  T a pxp 

orthogonal  matrix  and  A the  diagonal  matrix  of  eigenvalues  6^  A 

*—2^  * * 

preferred  estimate  is  £ = T'A  T with  6.  = min(l,  6.),  i = 1,  ....  p 

~ l i ’ r 

*-l 

since  this  estimate  satisfies  the  restriction  £ <1.  The  loss  function 

—a 

(1.2)  is  either  unchanged  or  reduced  for  every  S,  £ by  this  modification. 


(6.1) 


for  all  J3. 

*_1 

The  improved  estimator  £ has  risk  uniformly  lower  than  R of 

—a  J a 

(3.3)  because  of  (6,1).  In  the  simultaneous  estimation  context  of  Sec.  4, 
the  estimator 


(6.2) 


* 

e = u - £ )x 


—a 


—a 


therefore  has  risk  as  a function  of  A,  EE  4(0,  0 ),  strictly  lower  Chan 

A O —C? 

' * 

(4.4).  The  risk  as  a function  of  _9.  Eg4(0,  _0^) , is  likely  to  be  lower 
than  (4.5)  for  all  J),  and  is  known  to  be  for  p = 1.  This  conjecture 
is  not  proved  for  p > 2 however  because  the  completeness  argument 
used  to  establish  (4.5)  does  not  apply  with  0 (there  is  no  convenient 
expression  for  its  risk  as  a function  of  A) . 

The  proof  of  (6.1)  notes  the  convexity  of  the  set  of  matrices 
0 < £ ^ < I,  the  fact  that  the  loss  function  L is  a metric  derived 


r 


-1 7- 


from  an  Euclidean  inner  product,  and  that  in  this  metric  is  the 

-~a 

closest  matrix  in  the  convex  set  to  1 ^ The  precise  argument  is 


given  in  [1,  Sec.  6]. 


-18- 


7.  DISCUSSION 

— 1 

The  fact  that  dominates  the  best  fully  invariant  estimator 
-1  -1  ^ 

(k-p-l)S  of  zJ  for  our  not  fully  invariant  loss  function  suggests 
that  shrinking  the  best  multiple  of  S toward  the  identity  matrix  may 
be  effective  in  more  general  situations  of  estimating  a covariance  matrix. 
All  of  the  estimators  of  £ in  this  paper  are  orthogonally  invariant,  of 
the  form 


(7.1) 


z.(s)  = r'Sr 


with  F the  matrix  of  eigenvectors  of  _S,  say  S = F'Dr,  D diagonal,  and 
jo  a diagonal  matrix  whose  entries  are  functions  of  the  eigenvalues 
D of  S,  o = 3(D).  Explicitly,  the  best  linear  multiple  of  S,  £(S)  = S/(k-p-l), 
estimates  the  i-th  eigenvalue  of  z.  by  5^  = d^/(k-p-l) , while 
£n  = ( (k-p-l)J>  1 + (p“+p-2) I/tr(S) ) 1 uses 


(7.2) 


S<0)  = 


2,  , d.  °i 

1 + (£_+Er2)  JL- 

''k-p-l  ’ 2,d. 

J 


so  improves  on  by  shrinking  all  the  estimated  eigenvalues  toward 

zero,  the  larger  eigenvalues  being  shrunk  proportionately  more  than  the 
smaller.  This  is  reminiscent  of  the  James-Stein  estimator  of  k means 
[3],  and  the  basic  phenomenon  seems  to  be  the  same:  the  eigenvalues  of 

_S,  considered  as  an  ensemble  of  p numbers,  are  distorted  in  a systematic 
nonlinear  way  from  the  eigenvalues  of  z..  A universally  improved  estimator 
is  obtained  by  undoing  this  distortion. 

For  the  general  problem  of  estimating  a covariance  matrix,  it  would 
be  more  satisfying  to  show  that  estimators  of  the  form 
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(7.3)  Z - (aS  1 + bI/tr(S))-1 

^ ^ ~ 

dominate  the  best  fully  invariant  estimator  of  L when  the  loss  function 
is  also  fully  invariant,  but  the  computations  are  difficult  for  such 
loss  functions.  The  loss  function  used  here  leads  to  nicely  computable 
risk  expressions  for  rules  of  the  form  (7.3),  permitting  a comparison  of 
their  operating  characteristics,  and  more  importantly  snowing' where  the 
additional  information  lies  for  improving  the  best  fully  invariant  esti- 
mator. It  also  has  the  virtue  of  arising  naturally  from  the  squared  error 
estimation  problem  for  0. 

In  Section  5 of  [3],  Stein  considered  an  example  with  a fully 
invariant  loss  function  and  found  a constant-risk  estimator  (invariant 
under  the  lower  triangular  group  of  matrices,  but  not  orthogonally 
invariant)  which  is  uniformly  better  than  the  best  fully  invariant 
estimator.  The  expected  value  of  his  estimator,  like  here,  is  always 
closer  to  0 than  the  mean  of  the  best  fully  invariant  estimator.  He  has 
recently  made  further  progress  on  the  problem  of  covariance  estimation  by 
using  a method  for  finding  unbiased  estimators  of  the  risk  function  [7]. 

In  the  empirical  Bayes  and  the  simultaneous  estimation  of  means 
situations  the  loss  function  L is  natural,  as  the  derivation  in  Sec.  2 
shows,  and  the  simple  estimators  of  0 (2.7)  based  on  the  form  (7.3)  have 
computable  risks.  This  simplicity  also  leads  to  risk  expressions  as  a 
function  of  0 (Theorem  2)  and  yields  unbiased  estimates  of  the  risk.  These 
estimators  may  be  criticized  for  being  inadmissible  since  they  ignore 
the  restriction  L ^ < I.  The  rules  of  Sec.  6 may  be  nearly  admissible 
though,  at  least  in  the  case  p = 1 they  reduce  to  the  James-Stein  positive- 
part  estimator  for  which  no  uniform  improvement  has  ever  been  offered. 


% 
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Orthogonally  invariant  estimators  of  0 take  the  form  (2.7)  with 
£ as  in  (7.1),  and  are  not  necessarily  of  the  form  (7.3).  One  approach 
to  finding  alternatives  to  (7.3)  was  suggested  at  the  end  of  Sec.  5. 

Stein  [7]  offers  another  method  by  producing  unbiased  estimates  of  the 
risk  of  arbitrary  orthogonally  invariant  rules.  Other  rules  having  this 
orthogonality  property  are  offered  by  Gollob  [2]  and  Mandel  [4].  Their 
estimates  of  _0  correspond  to  using  (7.1)  in  (2.7)  where  l/o^  = 1 if  d. 
fails  to  pass  a significance  test  and  otherwise  is  zero,  forcing  0 < L ^ 
When  p = 1 their  rule  is  equivalent  to  estimation  following  a preliminary 
test  that  ||_0j|  = 0,  a procedure  that  is  known  not  to  be  minimax  and  to  be 
uniformly  dominated  by  some  positive-part  version  of  the  James-Stein 
estimator  [6] . 
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